Efficiency of microSIMD architectures and index-mapped data for media processors
نویسنده
چکیده
We show that microSIMD architectures are more efficient for media processing than other parallel architectures like SIMD or MIMD parallel processor architectures, and VLIW or superscalar architectures. We define alternative mappings of data onto subwords, and show that the index mapping is an ideal mapping for achieving maximal subword parallelism with minimal revamping of the original serial loop code. We show an example where packed data loaded directly into registers from memory can be interpreted as index-mapped data rather than area-mapped data. This allows increased use of the subword parallelism provided by the microSIMD architecture, by exploiting data parallelism across loop iterations rather than within a loop. We also show how to convert rapidly between data mappings by using the Mix permutation instructions, first defined in the MAX-2 multimedia extensions for PA-RISC processors. We propose a new instruction, MixPair, which cuts by half the cost of parallel Mix functional units, while achieving maximum subword permutation performance.
منابع مشابه
Subword Permutation Instructions for Two-Dimensional Multimedia Processing in MicroSIMD Architectures
MicroSIMD architectures incorporating subword parallelism are very efficient for application-specific media processors as well as for fast multimedia information processing in general-purpose processors. This paper addresses the unsolved problem of the need to permute the subwords packed in registers for maximum parallelism performance, especially for two-dimensional (2-D) multimedia algorithms...
متن کاملUltra-Low-Energy DSP Processor Design for Many-Core Parallel Applications
Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...
متن کامل21 Processor Architectures For
In this chapter, we present contemporary VLSI processor architectures that support multimedia applications. We classified these processors into two groups: dedicated multimedia processors, which perform dedicated multimedia functions, such as MPEG encoding or decoding, and general-purpose processors that provide support for multimedia. Dedicated multimedia processors use either function-specifi...
متن کاملFor Embedded Applications with Data-level Parallelism, a Vector Processor Offers High Performance at Low Power Consumption and Low Design Complexity. unlike Superscalar and Vliw Designs, a Vector Processor Is Scalable and Can Optimally Match Specific
Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-b...
متن کاملScalable Vector Processors for Embedded Systems
Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1989